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[57] ABSTRACT 

A method and apparatus for selectively filling a cache 
memory with a variable number of data words in re- 
sponse to the size and type of data transfer requested by 
the processor associated with the cache. According to 
the present invention a cache fill of either 16 or 64 bytes 
are provided. If there is a cache miss and an 8 byte word 
data transfer as requested, the larger fill is provided, 
similarly, if the 8 byte word data transfer is not re- 
quested, the shorter block of data is provided, resulting 
in enhanced performance over a fixed length cache fill. 

10 Claims, 1 Drawing Sheet 
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VARIABLE LENGTH CACHE FILL 

FIELD OF THE INVENTION 

The present invention relates to high speed computer 
processors, in particular, to computer processors having 
cache data and instruction stores. 

BACKGROUND OF THE INVENTION 
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Choosing the parameters of a cache fill strategy that 
will deliver good performance requires knowledge of 
cache access patterns. 

Long cache fills have the advantage that actual bus 
bandwidth rises towards the theoretical peak as read t; 
size increases. But once the read size exceeds the bus 
width satisfying the read requires multiple bus cycles 
and thus may increase cache miss tendency. 

If the code is making long sequential sweeps through 
one or more data structures that are contiguous in mem- 20 
ory (e.g., the sort of code that benefits most directly 
from a "vectorizing" compiler and vector hardware) 
then typically a long cache fill will be desirable. The 
extremely high locality of the stream of data references 
means that there is a commensurately high probability 25 
that the additional data read during a long cache fill will 
actually be used. Finally, because the performance of 
such "vector" applications is frequently a direct func- 
tion of memory bandwidth the improved bus utilization 
translates into increased application speed. 30 

When there is more randomness in the stream of data 
references a long cache fill may actually degrade per- 
formance. There are at least two reasons for this. Be- 
cause of the lower probability that the additional data 
will ever be used the larger number of bus cycles neces- " 
sary to complete a long cache fill may actually lead to 
an increased average memory load latency. The larger 
fill size also decreases the number of replaceable cache 
lines and may therefore hurt performance by increased 
thrashing in the use of those lines. In other words, it 
increases the probability that the process of servicing 
one cache miss will expunge from the cache the con- 
tents of some other line that would have taken a hit in 
the near future. When such behavior becomes espe- ., 
cially severe it is termed "thrashing in the cache". 

Thus, a conflict exists in providing a system which 
services the rather predictable needs of well behaved 
"vector" applications and the chaotic needs of more 
general computations. 50 

SUMMARY OF THE INVENTION 

According to the present invention, two distinct 
cache fill sequences of 16 bytes and 64 bytes are pro- 
vided and chosen according to the size and address 55 
alignment of the data requested by the associated pro- 
cessor. No data is transferred from main memory if 
there is a cache hit. If there is a cache miss, and either a 
quadword is not requested or a quadword not aligned to 
a multiple of 64 bytes is requested, a shorter block of 16 so 
bytes is transferred from main memory. If there is a 
cache miss and a quadword is requested, a longer block 
of 64 bytes is transferred to the cache from main mem- 
ory, in this context, a quadword is 8 bytes. 

BRIEF DESCRIPTION OF THE DRAWINGS 65 

These and other features of the present invention will 
be better understood by reading the following detailed 
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description, taken together with the Drawings, 
wherein: 

FIG. 1 is a flow chart showing the operation of one 
embodiment of the present invention; and 

FIG. 2 is a block diagram of one embodiment of the 
present invention operable according to the flow chart 
of FIG. 1. 

To keep the mechanics of cache management simple, 
cache lines must adhere to the same natural (or other) 
word alignment strategy as all other aspects of the ar- 
chitecture as defined in co-pending, commonly assigned 
U.S. patent application Ser. No. 07/255,105 entitled 
METHOD AND APPARATUS FOR CONCUR- 
RENT DISPATCH OF INSTRUCTIONS TO MUL- 
TIPLE FUNCTIONAL UNITS, filed Oct. 7, 1988, 
incorporated by reference. 

In recognition of the fact that opcode space is a pre- 
cious commodity and of the desirability of making the 
presence of a variable length cache fill mechanism to- 
tally transparent to a compiler or an assembly language 
programmer, the method and apparatus according to 
the present invention, when a cache miss occurs 
chooses an appropriate fill size. 

DETAILED DESCRIPTION OF THE 
INVENTION 

According to the operation of one embodiment 100 of 
FIG. 2 illustrated in the flow chart 50 of FIG. 1, when 
the processor 102 requests a data read or write 52 into 
cache 104, the vector reference detection logic 106 
responds to the type 107 (e. g., read, write, no-op) and 
data size 108 signals which indicates (54) if a 4 or 8 byte 
transaction is requested by the processor 102. 

If the data is in the cache 104 as indicated by a cache 
hit (56, 58) provided by a positive tag comparison 110, 
the transaction between the processor 102 and cache 
104 proceeds (60, 62) without data transfer from the 
main memory 120. 

Referring to FIG. 1, if the tag of the processor 102 
requested data was not found (56, 58) by the tag com- 
pare logic, and if the size of the processor requested 
data is 4 bytes, then a block of 16 bytes is loaded 66 into 
the cache from main memory 120. 

As can be seen in FIG. 1, if the processor requested 
data is not in the cache 104 as indicated by the tag com- 
pare logic 110, the vector reference logic determines 
(64) if the requested data address 112 is 64 byte aligned 
(aligned to an integer multiple of 64 bytes,) if so, then 64 
bytes of data is transferred (68) from main memory 120 
to the cache 104. If the processor requested data is not 
in the cache and if the requested data address is not a 
memory address aligned (i.e. corresponding) to an inte- 
ger multiple of 64 bytes, then only a 16-byte data block 
is loaded from memory into cache. Registers 111, 113, 
115 and 117 provide temporary storage of the com- 
mand, address and data signals. 

Two properties of the above-described system archi- 
tecture and process according to the present invention 
are particularly significant for several reasons. First the 
architecture may be viewed as incorporating "an ad- 
dress formation and sequencing unit, and an executed 
unit". This structure and the availability of both integer 
and floating point operations in the execute unit means 
that there is an obvious strategy for decomposing vec- 
tor primitives. This strategy will work independent of 
the type of data being processed. Second the present 
architecture provides selective 8 byte transfers to and 
from an even/odd pair of floating point registers. Since, 
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as mentioned above, vector loops tend to be limited by 
the rate at which operands can be made available from 
and results returned to memory using these 8 byte loads 
and stores to move two 4 byte data (2 single precision 
floating point values or 2 long word integers) in a single 5 
cycle makes an enormous difference in the performance 
of loops operating on vectors of 4 byte objects. Thus on 
the system architecture according to the present inven- 
tion, there is a very high probability that any "vector- 
like" loop will be implemented in terms of 8-byte loads 10 
and stores. 

Finally, since it is typically only vector loops that 
would benefit from long cache fill sequences and since 
the vast majority of all such loops process memory in 
ascending order we wanted to recognize the possibility ls 
for a long fill only when a cache miss occurred on an 
address corresponding to the beginning of a long cache 
line. This avoids excessive false triggering of the long 
fill in more general code while still permitting it under 
exactly those conditions when it will do the most good. 
Thus the present invention of providing a long fill for 
a cache miss that occurs while performing an 8 byte 
load from a long line (64 byte) boundary provides sig- 
nificant improvement over a single length cache fill. 

The scope of the present invention further includes 
an implementation which would support vector access 
to descending locations. This would be done by en- 
abling a long fill during a cache miss on an 8 byte load 
from the first 8 bytes or the 8 bytes of a 64 byte line. , Q 

Details of related bus structure and methods are pro- 
vided in co-pending, commonly assigned U.S. patent 
application Ser. No. 07/263,711 entitled A QUASI- 
FAIR ARBITRATION SCHEME WITH DE- 
FAULT OWNER SPEEDUP, filed Oct. 25, 1988 and 35 
incorporated by reference; details of related tag struc- 
ture and methods are provided in APOLL-H5XX, enti- 
tled DUPLICATE TAG STORE PURGE QUEUE, 
filed concurrently herewith and also incorporated by 
reference. Moreover, modifications and substitution of 40 
the above disclosed invention are within the scope of 
the present invention, which is not limited except by the 
claims which follow. 
What is claimed is: 

1. A method of selectively receiving and storing data 45 
blocks of selected lengths of data from a main memory, 
into a cache memory said method comprising the steps 
of: 

requesting a transfer of data between a processor and 
a cache memory, the data having a corresponding 50 
indicia and at least one of a first and a second 
length; 
determining if said indicia corresponding to the data 
to be transferred indicates the presence of the data 
in said cache; 55 

determining if a virtual address transferred with the 
data corresponds to a physical main memory loca- 
tion that is an integer multiple of a given number of 
bytes, if said data is not present in said cache; 
selectively transferring from said main memory to 60 
said cache a data block of one of a third and a 
fourth length in response to determining if said 
virtual address transferred with the data corre- 
sponds to a physical main memory location that is 
an integer multiple of said given number of bytes, 65 
wherein 
said second length is greater than said first length, 
said fourth length is greater than said third length, 



said third length is a higher multiple of said second 
length, and 

said data block of said fourth length is transferred into 
said cache memory if data of said second length is 
requested and if said virtual address transferred 
with the data corresponds to a physical main mem- 
ory location that is an integer multiple of said given 
number of bytes. 

2. The method of claim 1, 

wherein said first and second length comprise up to 4 

bytes and 8 bytes, respectively, and 
said third and fourth length comprise 16 and 64 bytes, 

respectively. 

3. The method of claim 1, further including the step 
of 

transferring from said main memory data having said 
third length if said first length of data is requested 
and is not present in said cache. 

4. Apparatus for selectively loading data to a cache 
memory from an associated main memory, comprising 

computer means for requesting a selected length data 
transfer with said cache said selected length data 
transfer being one of a first and a second number of 
bytes; 

means for determining the presence of said selected 
length data in said cache; 

means for determining if a virtual address transferred 
with the data corresponds to a physical main mem- 
ory location that is an integer multiple of a given 
number of bytes; 

means for selectively transferring a data block from 
said associated main memory to said cache if said 
selected length data is not in said cache, said data 
block being transferred to said cache having one of 
a third and a fourth length, wherein 

said second length is greater than said first length, 

said fourth length is greater than said third length, 

said third length is a higher multiple of said second 
length, 

said fourth length of data is selectively transferred if 
said second length is requested and is not present in 
said cache and if said virtual address transferred 
with the data corresponds to a physical main mem- 
ory location that is an integer multiple of said given 
number of bytes. 

5. The apparatus of claim 4, wherein said first, sec- 
ond, third and fourth lengths comprise 4, 8, 16 and 64 
bytes respectively and said given number of bytes is 64. 

6. Apparatus for selectively receiving and storing 
from a main memory into a cache memory, data of at 
least one of a first block size and a second block size, 
said apparatus comprising: 

means for requesting a transfer of data between a 
processor and a cache memory, the requested data 
being of one of a first data length and a second data 
length; 

means for determining that the requested data is of 
said first data length; 

means for determining that the requested data is of 
said second data length; 

means for determining that the data of said first data 
length and alternatively of said second data length 
does not reside in said cache memory; 

longfill means for determining that a first address 
transferred with the data of said second data length 
is an integer multiple of a number of bytes of said 
second block size; 
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means for loading from main memory into cache 
memory a data block of said first block size in 
response to said means for determining that the 
requested data is of said first data length and alter- 
natively in response to said means for determining 
that the requested data is of said second data length 
and in response to said longfill means; and 

means for loading from main memory into cache 
memory a data block of said second block size in 
response to said means for determining that the 
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requested data is of said second data length and in 
response to said longfill means. 

7. The apparatus of claim 6 wherein said first data 
length is equal to four bytes. 

8. The apparatus of claim 6 wherein said second data 
length is equal to eight bytes. 

9. The apparatus of claim 6 wherein said first block 
size equals 16 bytes. 

10. The apparatus of claim 6 wherein said second 
block size equals 64 bytes. 
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